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O ' Abstract 

o : 

■ We consider a somehow peculiar Token/Bucket problem which at first sight looks 

^ . confusing and difficult to solve. The winning approach to solve the problem consists in 

going back to the simple and traditional methods to solve computer science problems like 
the one taught to us by Knuth. Somehow the main trick is to be able to specify clearly 
what needs to be achieved, and then the solution, even if complex, appears almost by 
C/^ ■ itself. 

q 

^ . 1 Introduction 

00 . 

CN ■ In designing computer programs it often happens to have to implement subtle logics 

which are confusing and can look difficult to realize. But most of the times the 
problem can be solved even in simple ways. The main point of this article is that 
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O \ often it is better to go back to basic, try hard to understand what it is that we 

^ ' need to implement and design a little algorithm following even old and traditional 

thinking [1]. 

We will describe here of a somehow peculiar token/bucket algorithm, 

^ \ which at the end is just another variation on a infinite theme. The key to the 

d \ solution of the problem, which can be made mathematically sound, is just a little 

reasoning by examples trying to isolate the important aspects of the problem. 

The main issue is to formalize the problem in such clear terms that we can solve 
it. We'll see that once the problem has been described in a simple and precise way, 
the solution will appear almost automatically to us. 

The problem here described arised in designing an application in a distributed 
environment. Today more and more often massive computing is approached by 
using many (relatively) inexpensive machines and by running programs in parallel 
on them. We are getting used to hear about Cloud Computing, share-nothing 
clusters and distributed computing architecture based on parallel processing but we 
need to prepare our data and write our program in such a way as to make the best 
use of these new architectures. 
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A typical problem incurred in these architectures is to split the input data so to 
be processed in parallel. The processing of the data usually happens in steps, and 
splitting and merging of data can occur various times during the processing. 

To optimize the distribution of the computational load and the minimization of 
the transfer of data between nodes (data transfer between physically distinct nodes 
is always a very slow process with respect to in memory or even on disc processing) , 
we need to be very careful on how to split the data initially between the members 
of the cluster. 

A possibilc formalization of this processing architecture, the one useful and used 
here, is to represent the processing of data as happening inside buckets and the 
data itself as tokens. So a cluster of machines or a group of parallel processes, is 
represented by a set of buckets in which we distribute our tokens, that is the data to 
be processed. The various steps of the processing can be formalized by different sets 
of buckets between which we move the tokens. Our aim is to optimize the initial 
splitting of the data/tokens in the buckets. 

2 A first description of the problem 

The problem at hand consists in a three steps processing where the first two steps 
happen in the same set of buckets, that is cluster of machines or group of parallel 
processes, but with an important specification, and the third step happens in a 
second set of buckets. The tokens are nothing else than input data which can be 
splitted to be processed in parallel. 

Actually to optimize the splitting of the data/tokens in the first set of buckets, 
the tokens are first splitted in a subset of the first set of buckets and only in the 
second step they are distributed in all buckets of the first set. This peculiar process 
depends on the details of the processing of the data, and it is not relevant to the 
algorithmic problem we want to describe. 

The setup is then the following. We have initially T tokens and we need to design 
an algorithm to put the tokens in B buckets first and in another set of B' buckets 
later. Each token has also a label which is related to the number of the token itself 
and which we will use to select the bucket where to put the token. There are a few 
constraints: 

• we should initially put the tokens only in C consecutive buckets (C < B) out 
of the B buckets of the first set, and the first bucket of the C set can be any 
bucket in B (so we count the B buckets modB as if they make up a ring) 

• the tokens should be distributed homogeneously in the buckets both with re- 
spect to the number of tokens in the C buckets and to the distribution of the 
labels in the C buckets 

• in the second step, the tokens will be redistributed in the B — C buckets 
which have been left out in the first step, with the constraint that each token 
can be moved only once and that tokens cannot be redistributed between the 
C buckets already filled in, still the final distribution of tokens must remain 
homogeneous between all buckets both in number and label 
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• finally the tokens will be moved to the second set of B' buckets with B' > B 
and they must be distributed homogeneously without changing the label of 
each token. 

3 Discussing examples 

Even if this formulation of the problem gives us an idea of what wc need to do, it 
is not precise enough to permit to find an algorithm which solves it. To solve the 
problem we need a much more precise description of the problem, and for that we 
need to go back to what we want to realize and describe it in more details. 
The best way is to try first with a couple of examples. 

The first case is when C = 1, this is an exceptional case where all tokens go in 
the same bucket and it does not teach us much. Anyway we still have to keep in 
mind that our final algorithm must work also when C — 1. 

The first interesting case is instead when C = B. Wc can start to put the first 
token in the first bucket and continue like that (modB). This is a round-robin 
distribution. In this case we do not need to redistribute the tokens in the first set 
of buckets since they will be all already full. 

Notice that the number of tokens in each bucket differs at most by 1 from the 
number of tokens in each other bucket. We can say that this is our homogeneity 
property with respect to the number of tokens in the buckets. 

The first attempt to define the label is to simply set it to the number of the 
bucket where we put the token. This anyway will not work when we will put the 
tokens in the second set of buckets since we have B' buckets in this case. Instead 
we can set l{t) — t where I is the label and t is the number of the token, and choose 
the bucket where to put a token with l{t) modB for the first set of buckets, and 
l{t) modB' for the second set of buckets. 

But we could start to put the tokens not in the first bucket, but in any bucket. 
Let denote by / the first bucket in the first set of buckets where we put the first 
token. Then we need to modify the I map to make it work when f ^ 0, but the 
solution is quite simple: l{t) = t + f. The maps to select the bucket where to put 
each token are still l{t) modB for the first set of buckets, and l{t) modB' for the 
second set of buckets. 

Now what to do when 1 < C < B ? we can start by assigning l{t) = t + f and 
put the first C tokens as before, this works fine. The problem comes with the next 
B — C tokens, where do wc put them? The map l{t) modB docs not work because 
it selects a bucket not in the set of the C buckets to fill in at first. On the other 
side, the definition of / and the two maps to select the bucket where to put a token, 
will work fine when we will have to move the buckets to the B — C empty buckets 
of the first set, and later on to the B' buckets of the second set. So we keep / and 
the two maps as they are. 

What we can do in this case instead is to have two independent round-robin 
cycles, one for the tokens where l{t) modB is one of the buckets we should fill in, 
and one for the other tokens. The two round-robin cycles run independently and fill 
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in the same buckets. If the two round-robin cycles run in the same direction, at the 
end we can have buckets with number of tokens which differ by 2 from the number 
of tokens in other buckets (if this it is not obvious, wc will see it in details later on). 
But there is a simple way to avoid this, which is to run the two round-robin cycles in 
opposite directions, one clockwise and the other counter-clockwise (assuming that 
the buckets make a circle modC, that is they form a ring). 

4 A formal description 

Now the description of the algorithm is getting a little complicated, so we need to 
formalize it to make it explicit and to be able to prove the correctness of its solution. 

We can then try with the following formal description of the problem. 

Given an ordered set B = {60, hi . . . of B buckets, an ordered set B' — 

{60, . . . Vb'-i) oiB' > B buckets, an ordered subset C = {bf, bf+i modB ■ ■ ■ bf+c-i mods} 
of B with C buckets, and an ordered set T = {to, • • • ^t-i} of T of tokens, let Its, 
Xqi, Iq and 1r be the ordered set of integers given by the indices of the elements of 
B, B', C and T respectively. 

Assign to each token tj a label / by the map L : T ^ C where C is isomorphic 
to T and I : Xr ^ Tc- In practice we choose Xc to be the same set of integers as 
Xr up to a constant shift / and a little empty interval towards the high end of the 
set, as described below. 

Let M be a map M : T ^ C, m the associated map m : — > Xc, M' : T B', 
m' -.Xc^ Xb', M" -.T ^ B, and m" : Xc Xb- 

We summarize our requirements: 

1. L is a one-to-one map 

2. at the end of filling up the first C buckets, the number of tokens in each bucket 
can differ at most by 1 from the number of tokens in every other bucket 

3. let p{q) be the number of tokens with q — l(t) modB, at the end of filling up 
the first C buckets each p{q) can differ at most by 1 from every other p{q') 

4. to fill up the B — C empty buckets, tokens can be moved only once from the C 
buckets to the B — C empty buckets, no redistribution between the C buckets 
is allowed 

5. at the end of redistributing the tokens in all the B buckets, the number of 
tokens in each bucket can differ at most by 1 from the number of tokens in 
every other bucket and for all tokens in a bucket must hold b — l{t) modB 
where h is the number of the bucket 

6. after redistributing the tokens in the final B' buckets, the number of tokens in 
each bucket can differ at most by 1 from the number of tokens in every other 
bucket and for all tokens in a bucket must hold b' = l{t) modB' where b' is the 
number of the bucket. 

For implementational reasons, we add the following constraint on the solution: 
of the two round-robin cycles of the first step, the one of the tokens with l{t) modB 
not in C should run in the direction of increasing values of t. 
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5 Finding the solution 



First we need to determine the map m in the general case. It is obvious that m{t) ~ 
l{t) modB if C = 5. In the general case, if t + / modB e Xc then we set m{t) = 
l{t) modB. This already satisfies part of requirement 5. The simplest choice of l{t) 
would be l{t) — t + f. But we should recall that we need two round- robin cycles 
running in opposite directions and that, due to the above mentioned constraint, 
this round-robin cycle should run in the opposite direction of increasing values of t. 
Pratically this means that we would like to have something like /(O) = / + C — 1, 



;(1) = / + _ 1 _ 1, /(C - 1) = /; that is something like l{t) = f + C -1-t. 



But this obviously cannot be right since it contains —t and then I would decrease 
steadily instead of increasing in jumps. 

So we should use instead a more complicated function which should satisfy: 



;(0) = / + C - 1, /(I) = / + C - 1 - 1, /(C - 1) = /, and then 1{B) = 
f + B + C-1, l{B + l) = f + B + C-l-l = f+{B+l) + C-l-2, 
l(B + C-l) ^ f + B ^ f + (B + C-l) + C-l-2(C-l) which is given by 
l{t) ^f + t + C-l-2*{t 7nodB). 



Now with this definition of l{t), the map m{t) = l{t) modB puts a token with 
label I in the bucket number m going from C — 1 to when t goes from to C — 1. 
This works for all tokens such that f + t modB G Tc- 

For the tokens that should go in the B — C buckets we are not fiUing in the first 
round, we adopt a simple round- robin scheduling. We set l{t) = f + t and use a 
counter r which increases independently every time a token is put in one of the C 
buckets. The counter r is an integer from to C — 1 modC. In practice we put the 
token C in the bucket / and increase r by 1, the token C -|- 1 in the bucket / + 1 
... the token B — I'm the bucket f -\- B — 1 — C (all modB) always increasing r by 
1 mode. After that wc go back to the first cycle since for the next token it holds 
f + t modB G Iq. When we get to the token t = B + C, the next one in the second 
cycle, we should put it in the next bucket pointed to by r, and continue like that. 

After this, we claim that the two next shuffling of tokens are simply given by 
the maps: m"{t) = l{t) modB and m'{t) — l{t) modB'. 

Let's summarize the solution we found. To specify the solution we need the 
definition of four maps: m(t), m"{t), m'{t) and l{t). We have 



m{t) 




modB 



iif + tmodB G Ic 



otherwise 



m"{t) 



l{t) modB 



m'{t) 



l{t) modB' 




f + t + C 



1 - 2 * (t modB) ii f + t modB G Ic 



otherwise 
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Verifying the solution 



To check if this solution satisfies our requirements (we are not going to make here 
a formal mathematical proof, even if it is not too difficult but it is lenghty and not 
too much illuminating) we will show how each of the steps of distributing the tokens 
in the buckets works. 

First of all, requirement 1., L is a one-to-one map, is automatically satisfied by 
the definition of the map. 

In the first step wc put tokens only in the C buckets of the C set. The condition 
f + t modB G Xc and the round-robin distribution of the other tokens, guarantee 
that all tokens are put only in the buckets of the C set. Moreover the buckets with 
f + t modB G Xc are put in the correct bucket also for the second distribution, 
that is they will not be moved in the second distribution. 

Let's consider requirement 2. When f + t modB G Xq wc put the tokens in the 
buckets / + C — 1 .. /. When we finished the first C tokens, we start putting the 
next B — C tokens in /, /-|- 1 etc. When we finish the tokens with f -\-t modB ^ Xc 
we have in general at most a difference of 1 in the number of tokens in the buckets 
of the C set. Then we start again with the first cycle and we put exactly one token 
per bucket in the C set. And we continue like this. The point to discuss is what 
happens at the end. 

If the last token has f-\-t modB ^ Xc then for what we just said the difference in 
number of tokens in the buckets of the C set is at most 1. If instead f+t modB G Xc 
then the situation is more complex. If we put the last token in the / bucket, then 
again the difference in number of tokens in the buckets of the C set is at most 1 as 
before. 

Otherwise denote by z the bucket in which we put the last token of the second 
(round-robin) cycle and y the bucket in which we put the very last token. If z+1 = y 
(remember the direction of filling of the two cycles) then all buckets in the C set 
have the same number of tokens. If ^ -|- 1 < y then the buckets in the C set with 
number in between z and y have one token less than the others. If instead z+1 > y 
then the buckets in the C set with number in between z and y have one token more 
than the others. 

Notice that in the case in which the last token is in the first cycle and is not 
in the / bucket, the list of integer numbers l{t) has a gap. Indeed l(t) starts at /, 
but if we write the last value a.s N * B + M then the values N * B, N * B + 1, 
N * B + M — 1 are missing from the Xc set. 

In conclusion the difference in number of tokens in the C set is always at most 
1. This shows that requirement 2. is satisfied. 

Requirement 3. is again automatically satisfied by our definition of L. 

From the definition of / and m" it follows that only the tokens distributed in 
the second round- robin cycle are moved in the second step, they are moved once 
and directly to their final bucket in the B set. This shows that requirement 4. is 
satisfied. 

Since L is a one-to-one map and from the definition of the m" map it follows 
that also requirement 5. is satisfied. 
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Finally again due to the definition of L and of the m' map, also requirement 6. 
is satisfied. 

Conclusive remarks 

The solution we described for the problem at hand is one solution, most probably 
it is not the only one. For example, without the constraint on the direction of the 
two round-robin cycles, we could choose the opposite direction for them and get a 
different solution. 

Moreover, we do not claim that this is the simplest or more efficient solution, 
but it is one which practically was easy to implement using a couple of pointers. 
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